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Answer the following questions: 

Question No. I (15 murks) 

For each of the following, please circle the letter introducing the best answer- each one is 
worth one mark: 

1. Which word or phrase completes the statement? A spreadsheet to a data island is as a 

centralized database to a ? 

Data Warehouse 
'Is) Data Repository 

c) Analytic Sandbox 

d) Data Mart 

2. You are studying the behavior of a population, and you are provided w ith multidimensional 
data at the individual level. You have identified four specific individuals who arc valuable 
to your study, and would like to find all users who are most similar to each individual. 
Which algorithm is the most appropriate for this study? 

a) Linear regression 
Association rules 
cj) K -means clustering 
d) Decision trees 

3. in which lifecycle stage the analytic sandbox is prepared? 

a) Discovery 

b) Model planning 
^c) Model building 

c^jlj) Data preparation 

4. When would you use a Wilcoxson Rank Sum test? 
a) When the data can easily be sorted 

When you cannot make an assumption about the distribution of the populations 

c) When the populations represent the sums of other values 

d) When the data cannot easily be sorted 

A data scientist wants to predict the probability of death from heart disease based on three 
risk factors; age. gender, and blood cholesterol level. What is the most appropriate method 
for this project? 
a ) Linear regression 
ij) Logistic regression 

e) K -means clustering 
d) Apriori algorithm 

6. Consider the example of an analysis for fraud detection on credit card usage. You will need 
to ensure higher risk transactions that may indicate fraudulent credit card activity are 
retained in your data for analysis, and not dropped as outliers during pre-processing. What 
will be your approach for loading data into the analytical sandbox for this analysis'* 
a) ETL 
KDW 
} ! F.I.T 
) OI.TP 

In which lifecycle stage are initial hypotheses formed? 

Discovery 
Model planning 
C) Model building 
d) Data preparation 


!i 


5. 
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g t A disk drive manufacturer has a defect rate ofless than 2% with 98% confidence. A quality 
assurance team samples 1000 disk drives and finds 14 defective units. Which action should 
the team recommend? 

a) The manufacturing process should he inspected for problems. 

b) A larger sample size should be taken to determine if the plant is functioning properly 

c) A smaller sample size should be taken to determine if the plant is functioning propci h 
The manufacturing process is functioning properly and no further action is required. 

9. Which characteristic applies only to Business Intelligence as opposed to Data Science 

a) Supports solving “what if scenarios 

b) Uses large data sets 
P@ u ses only structured data 

d) Uses predictive modeling techniques 

10. Which activity might he performed in the Operationalize phase of ihe Data Analytics 
Lifecycle? 

a) Try different analytical techniques 

b) Try different variables 

e) Transform existing variables 

Run a pilot 

1 1. You arc asked to create a model to predict the total number of monthly subscribers for a 
specific magazine. You ate provided with one - years’ worth ot subscription ami payment 
data, user demographic data, and 10-years’ worth of content of Uve magazine (articles ami 
pictures). Which algorithm is the most appropriate for building a predictive mode! fur 
subscribers? 

a ) Linear regression 

b) Logistic regression 

c) Decision trees 

d) TF-IDF 

12. Your organization has a website where visitors randomly receive one of two coupons If is 
also possible that visitors to the website will not receive a coupon. You have been asked to 
determine if offering a coupon to visitors to your website has any impact on their puichase 
decision. Which analysis method should you use? 

a) K-means clustering 

b) Association rules 

e) Student T-test 
d>, One-way A NOV A 

j.Vhen would you prefer a Naive Bayes model to a logistic regression model for 
classification? 

a) When you need to estimate the probability of an outcome nut just u liieh class u is in 

b) When all the input variables are numerical. 

■ z )) When you are using several categorical input variables with over 1 000 possible \ .dues 
each. 

d) When some of (he input variables might be correlated. 

14. Which data asset is an example of quasi-structured data? 

Web clickstream data 
T>) XML data file 

c) Database table 
cl) P. News article 

15^Whai is an example of a null hypothesis? 

1) that a newly created model does not provide better predictions than the currently 
existing model 

b) that a newly created model provides a prediction of a null sample mean 

c) that a newly created model provides a prediction of a null population mean 

d) that a newly created model provides a prediction that will be well fit to die null 
distribution 



6 ?o_£sWoa 'vO 


W&Ye rtJ ?-Y ^Ir «X^^- S^ e **'* 

/v-en-oy^ Ayx^, ^ T ft, 

S' / 7 \ w 5 J^v. bei***' >4y 

|lvJi eX -^ S^r^C • °- /\AJ s Vft 

\ 6 ^VC'^ ^ J ' 

^ itf-fW 1 ^ s - rr “ ,„>h -■" 

V, U>V AW* k^p.-W^ +^ 

U.ae iiWtv*^ e9Pec<ri. 


Ar^-V 


C 3 U v^v 


^L-e 

\J O-'Y \ <*-*\- C_X 


tZKwk VW-^-^va 

y'OLrvl vJiortd-wCfc 


To 

-71 


j PL*-Ce. *U 
T? 

43 
So 
S 7 
63 


<&°> - A o 

\\ 7 -- 3 o 


-OO 

\ 0 <^ . ot> 


S © • C o 

Saz. \4 


<Z,\tQ'\ \ -+k_e iruso.*' 

^ 7 . * 77 n 

r YV t --« — — . 


Ai 4 ^ lochh^yft 


3 

\ ° 

t 7 

^3 

3 o 


l 6 *fe< 

Il'Z.- 3 > 


c %" 3 - 4 


fTv 


^ 7 +4^5»t57^) 

^ 2 .S 0 * 


yri 


5 


3 JcU V 17 3-03 
' S~ 


s \ ^ * 6 


fYl 


* c. 


- 


yU v a ^ 3 . +5» -\j-V6-4 

' 


5 o 


3 


3 


<^K? 2 


.9 


s^vxw *Y si^vn 


< C 


VvAW.rv 5 J 


^<x!-«,)' * =f c** * #Z - 


c; (7..<t*.\5± ( 77 -*'-') 

5 4+^.1 ^ V "*- 


.\* . /«.. ^ +6*-'il-'l) ! ^(57-£i-<y 

) -v‘ A V-* •«■» - u ' - ' 


| ^ - 4 


5 


££ (Y -m.) *1^ 

( 3 


O 



< S 
-5 


11155-6 
rv-- a 9 

I'v — — > jfv_o . <^y ^vrf ^ xi \^> c\A- ^ ^ 


2_ 

% 

S K__v 

mss- ^ 

" t. - \ 

3- 6577 

V 

3-3^; 4U.V\ 

1 ^ 4.4 ? ||\ 

* "Z* 

^ /M - \<~ T 

15-3 



2- 

r~ 

z 1 

F * 

_3— 5 S»-I 
St) 

Yql lec^ 

NaAl * 



IS 





«S IS 


■■ * V : •" 


HD Grvsii^ ^ sel .9 \h«* -'s l-.[m;/K,bre*4, 

WWer,Vjee/j d-«4«-Wse »"? 4m HS.*^’ 0 ^ 5 

0«k,V«^ -R* (i,U 1 ^ 4 ° 

ok. »e +^w) sw iKUf* 

^fyl.Y-v 

4a 7 ) 4» 9W <*K i+enrv -id (a 

^ PL ' 4-,«a s^ + > 4 " f 

,\ JPi »st yve‘lu_£n> n £ »' ^ 

V?) U-*>£ ^ . , f („[- /yt/Vlim^^ 

- P-W« OA^W,mK Lcr 

mirurwAxim c ^'^ 1 1 ,. y^^ s - 

, .. V }, ?»v^ -SS«ci^- 

>jipfW *C ' 7* <.) 

\ \stk i 0 j A.iU/. I iWA ! ~QuJr\< 



K 15 '=£> 


7 ^ 6 


5=>-V<l{> CL 


c 


u 


i4e.m s e. 


*V\f IvC 
V?y*> <^d 

S <_>^V 4 e. y 


k?6 <1/ 


it? 





\t§ 


2 . 


^2, 

u 


jC 4 Ci yw *> e 4 

X-^tf KYv-c, £ t- 

j— 

^K( \\A / j?Vt2 «-<i 

f n\\\fc. y k?ve<xi 


*\;Ik , bu+ + tr 

1 ,• fy\i Ik t b \d\ zv 

5 

V? > e. «3^C , 1? <^.4 4 e v 1 

| \?\e.td ,\?uA\e . y 

6 


-t? 


4e? 3 


X 


C* 

i 'V £.rvx 5> C "t 


l 


■yvi \ lA , \>y& v~ <L s bed ^ey 


=7 


4< 


nrf^/Vvio^ ,Vk44 


AU^Ve Va Us 

) fwl VlA Wve^A 

J) W*-^ ^7" bdfbei 



L -2_ 

2-^) Wve<xi -v7 />VI 1(< 

V) fcaWtK -^7 


^-<V-Le 

Set 

Cni 

Se't 


CYvV 

CwiR dev^Ge 

' ^ AmIK- -^AWtn 
V’Y^o^l 

mt|K. 


milk ^brt<x< 


6> 

4- : a-i •/ 

* v V?vo< 4 . 4 l\_€v\ 

vOrv i \v^ 

We 4. 

w 

j 

bn* <4 llj 

n 

6 

l c b* / % 

1 0 

i? We<4 

V? c *_4 -V e. y 

bre«4 

\- 

1 

[ 

1^4 W-W< 

'Y 


-p- r 6» y. 

1 * 

1 

1? W 44 * v Alv^n 

W-tWr 

*7 

W &4 fyW-Wcf 

<< 

1 

A. 5 %g.7 y 

1 


UJ £ \ v\- 62 ‘ 7 a V 

Yod_£. is * i^r o^V-Vey' •V) — V>ye<^d 


Q^ >• V ) w k r ir tytiYv_Cg ^ c\ 

$ ^ * r\_£SS »AAell 

L? sA n^-o-A vx-r el d olA^ 

L? Aya4 v A v c>w-e\A <-> o ujf C4>S 

(_^ /YV^A-°-^. ^ d CV A^S e 1 i> 

4^, 'Aa^vd <xV 1 


U/ e e rv 


H V<> V Vb yy^ _J 

L^UouJ Aid u>e Sell ? 

Vs )kjL probiim? 



D^V <a. S cie.nct 

^vW+w-vei / u-ns.4>M_c)-u.y e 4 

Wl <-''P<* Y, Pes ,£> W(Us _' 
u^evj Luy^ <Lc^Sc-U . 

^ ^ Vi rv\ 1 "Z-eeV > 0 y\ / y £<d( ( c A \ vl -£ 

^cLel «‘rv^_ f c,^cJ[{ sVi c -«*- 1 o.Aa/v^’^ 

^ H > s ^ u^sV Tf«>n4 \ 


X -e^J ed c (ujeS'V' ov ^ 


Z3 




VU V •f 








o • £> 


^3 




P(c) <-°2- (PW> 



<xHy 


^ P0>) ^ h) ^ ^ c '3 



i 

. 

r 



